An Interest Point Detector and Local Image Descriptor for 3D Rigid Scenes
نویسنده
چکیده
Approaches for the detection of points of interest (POI) have been applied to several different problems in computer vision, like motion tracking, image registration, object recognition, video indexing, and even texture classification. Since accompanied by corresponding local image descriptors, they have received growing attention. Current approaches completely remain in the two dimensions of the image plane, although they are used in applications like wide baseline stereo matching, or multiple viewpoint object recognition, where 3D transformations occur, which can only be approximated by 2D transformations in the case of highly planar scenes. Accordingly, it has been shown, that current approaches have problems with complex 3D scenes. In the proposed thesis, an interest point detector and a corresponding local image descriptor will be developed that are invariant against (local) 3D rigid transformations. Currently, the feasibility of a shape from shading approach is investigated. The thesis is expected to be finished in 2008. In image processing, a point of interest (POI) is a point in an image, that has special properties which make it stand out in comparison to its neighbouring points. What exactly these properties are differs between POI detection approaches. Pritchett and Zisserman refer to interest points as “reliable generic visual primitives” [1]. They are often also called salient points. Most POI detection approaches share the following common properties: – Only a small fraction of all image points is regarded as points of interest. – Points of interest denote image points with high information content. Here, the notion of information content is mostly relatively low-level, based on information theoretical considerations. Because a single image point does not contain much information, the direct neighbourhood of a point is also considered during the detection process. Thus, in practice, an interest point denotes an image point as the center of a local image patch. – Interest points are therefore more distinctive than the average image point. This property is exploited by several applications of interest point detectors. ⋆ Advisor: Prof. Dr. Otthein Herzog – Approaches to detect points of interest are often accompanied by corresponding local image descriptors, which comprise the information of interest points and their neighbourhood image patch in a compact form. POI detection approaches have long been used in image processing. Because of their distinctive nature, they are used in motion estimation and tracking, minimizing the so-called aperture problem [2]. In shape from stereo approaches, points of interest are used for matching two widely separated views [1, 3]. They also play an important role in object recognition [4, 5], by describing object classes as sets of local features with spatial relations between them. Due to their local nature, the use of interest points and local image descriptors makes it possible to deal with occlusions. Furthermore, the need for a preceding segmentation, which is a very difficult problem of its own, is omitted. Recently, local image descriptors have been used for efficient indexing and retrieval of images and videos [6]. There, instances of a local image descriptor for a huge set of interest points on real world images and videos are clustered to form a kind of visual vocabulary. Each cluster corresponds to a word. Then, established and efficient text retrieval algorithms are used to index video data, using an inverted filesystem approach. Another recent application for local image descriptors is the representation and classification of texture [7]. In the proposed thesis, a new approach for detection of points of interest and a corresponding local image descriptor shall be developed, that can be applied to 3D rigid scenes. I.e., the approach shall be able to cope with 3D rigid transformations of the image data. In the next section, the thesis’ goals and preconditions will be formulated in further detail. Existing approaches related to the thesis will be covered in Sec. 2. Section 3 introduces the proposed approach. The paper concludes with a summary and an outlook (Sec. 4). 1 Preconditions and problem formulation Two problems are addressed in this thesis: – The problem of creating an interest point detector that is repeatable under 3D rigid transformation of the underlying scene – The problem of creating a local image descriptor that is invariant against said transformations Input to the algorithm shall be single, twodimensional colour or grayscale images of real world scenes. Alternatively, if image sequences are used, no 3D rigid transformation between successive images is presumed, because, in the general case, it is not always existent. In many cases, there will be only 2D motion induced by camera motion, like pan or zoom. This means that the algorithm can not rely on 3D rigid motion to be present, and thus will not be able to use the additional information possible to extract given the motion, i.e., depth information. The requirement for the interest point detector is repeatability under 3D rigid transformations. In [8], repeatability is defined as follows: “Repeatability is the variation in measurements taken by a single person or instrument on the same item. A measurement is said to be repeatable when this variation is small.” This means, if any point in one input image is detected as a point of interest, the same point should also be detected in another image which shows the same scene undergoing any 3D rigid transformation. More precisely: Given a detected interest point p in an image I that is a projection of a point w in the world onto the image plane of I, p = t(I, w), and given a second point p = t(I , w), which is a projection of w onto the image plane of a second image I , taken from another viewpoint. Then, if p is visible in I , it should also be detected as a point of interest in I . The repeatability criterion is crucial for applications like image registration or feature-based motion estimation. Given two images, if matching is only done between the two sets of interest points detected in two images, failing to meet the repeatability requirement will inevidently result in false matches. The same is true for image similarity matching, where missing interest points result in a decrease of the similarity measure. The requirement for the local image descriptor is invariance against 3D rigid transformations. In [9], invariance is defined as follows: “An invariant is something that does not change under a set of transformations. The property of being an invariant is invariance.” In this case, considering the local nature of the image descriptors, the set of transformations consists of local 3D rigid transformations. The local image descriptor will be used for comparison of two points of interest by means of a similarity measure between their descriptors. Invariance therefore means, that two projections of the same point in the world should be similar to each other, and that two projections of two points in the world should be similar, if the neighbourhood (in the world) of the two points is similar.
منابع مشابه
A novel Local feature descriptor using the Mercator projection for 3D object recognition
Point cloud processing is a rapidly growing research area of computer vision. Introducing of cheap range sensors has made a great interest in the point cloud processing and 3D object recognition. 3D object recognition methods can be divided into two categories: global and local feature-based methods. Global features describe the entire model shape whereas local features encode the neighborhood ...
متن کاملThing Locally, Fit Globally: Robust and Fast 3D Shape Matching via Adaptive Algebraic Fitting
In this paper, we propose a novel 3D free form surface matching method based on a novel key-point detector and a novel feature descriptor. The proposed detector is based on algebraic surface fitting. By global smooth fitting, our detector achieved high computational efficiency and robustness against non-rigid deformations. For the feature descriptor, we provide algorithms to compute 3D critical...
متن کاملNew Pseudo-CT Generation Approach from Magnetic Resonance Imaging using a Local Texture Descriptor
Background: One of the challenges of PET/MRI combined systems is to derive an attenuation map to correct the PET image. For that, the pseudo-CT image could be used to correct the attenuation. Until now, most existing scientific researches construct this pseudo-CT image using the registration techniques. However, these techniques suffer from the local minima of the non-rigid deformation energy f...
متن کاملObject Recognition based on Local Steering Kernel and SVM
The proposed method is to recognize objects based on application of Local Steering Kernels (LSK) as Descriptors to the image patches. In order to represent the local properties of the images, patch is to be extracted where the variations occur in an image. To find the interest point, Wavelet based Salient Point detector is used. Local Steering Kernel is then applied to the resultant pixels, in ...
متن کاملPlace Recognition using Surface Entropy Features
In this paper, we present an interest point detector and descriptor for 3D point clouds and depth images, coined SURE, and use it for recognizing semantically distinct places in indoor environments. We propose an interest operator that selects distinctive points on surfaces by measuring the variation in surface orientation based on surface normals in the local vicinity of a point. Furthermore, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TCDL Bulletin
دوره 2 شماره
صفحات -
تاریخ انتشار 2006